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Abstract 

Analysis of the retrieval architecture of the highly influential UNIX 
file system mm) provides insight into design methods, constraints, 
and possible alternatives. The basic architecture can be understood 
in terms of function composition and recursion by anyone with some 
mathematical maturity. Expertise in operating system coding or in 
any specialized “formal method” is not required. 


1 Basics 

The retrieval (read-only) operation of com¬ 
puter file system can be represented by a map: 

File : Identifiers —)■ Contents. 

A disk drive corresponds to a much simpler 
map 

Disk : BlockNumbers —)■ Blocks 

where blocks are just hxed length sequences 
of “bytes” ( 8 adjacent binary digits). Until recently, hie systems had to 
be designed around the problem of building a reasonably sophisticated map 
of the hrst kind from the simple operation of second map. The UNIX hie 
system mm) supports variable sized hies from small text hies to videos and 
databases and also organizes hies into a tree structure so that hie names 
describe paths through the tree. The map looks like 
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F : Paths{X) —)■ Contents. 

The set Paths{X) is the set of hnite sequences of non-null strings over some 
alphabet X. Implementations use special characters as separators — e.g. 
/a/b/c where / but fundamentally paths are sequences of string^ 

The UNIX designers split the problem of building this system on top of a 
disk drive into two conceptually distinct problems. First they looked at how 
to get past the hxed size: 


a : Indexes —)■ Contents 

where Indexes is a set of numbers and Contents includes, at least, all se¬ 
quences of bytes within the limits of the hie system. The second step is to 
embed the tree into this hrst system: 

[3 : Paths{X) —)■ Indexes 

The hie system map is then given by: F{p) = a{f3{p)). The map fl relies on 
a clever technique where Contents is the union of the set of byte sequences 
(ordinary hies) and the set of maps Strings{X) —)■ Indexes {directories). If 
a{i) is a directory, then a{i){x) applies the map that is the value of a{i) to 
the argument x. 

A recursive descent from an initial index processes strings starting on the 
left. Let Null denote the null path (0 strings). If p is a path and a: is a 
non-null string, then xp is the path obtained by appending a; to p on the left. 
Then every hnite path is either the null path or of the form xp. 

M{i, Null) = i 
J\f{i,xp) = J\f{a{i){x),p) 

At each step, J\f resolves the leftmost string in the path - assuming that a{i) 
is a directory that is dehned on that string. 

To dehne (3 then we just need to pick some root G Indexes and then: 

/3(p) = J\r{root,p) 

^ In practice, the length of paths and lengths of strings in the path will be constrained 
by some bound, but we don’t have to worry about that now either. 
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2 Consistency and extending the model 

One useful property of N is that it is guaranteed to terminate: 

If p is n elements long terminates in at most n + 1 steps (1) 

This property, assures implementors that their program will not cycle in- 
hnitely trying to reduce a path to an index. A second property implicit in the 
dehnition of A/” is that if a path names a hie, every prehx of that path names 
a directory. Let px be the path obtained by appending x G Strings{X) to 
p G Paths{X) on the right. 

If F{px) is dehned then F(p) is a directory (2) 

There are a number of additional properties that a and root need to satisfy to 
make this a reasonable hie system. The most important are the “no orphans” 


and “no dangling references” properties. 

If a{i) is dehned, there is some path p s.t. Af{root,p) = i (3) 

F{px) is dehned if and only if F{p){x) is dehned. (4) 

There’s another useful pair of properties for two special strings " (self) 
and (parent). 

if F{p) is a directory, then F{p){".”) = J\f{root,p) (5) 

if F{px) is a directory, then F{px){”= J\f{root,p) (6) 

F (root) root (7) 


Do we want to permit aliases - distinct paths that don’t contain any 
or “.’’strings but that name the same hie? Say a path is “dot free” if it doesn’t 
contain either or as elements (other dots are ok). We can require 
that for any two dot-free paths p, q: if J\f{root,p) = M{root, q) then p = q. 
The most important aspect of a property like this is the elimination of loops. 
Consider a program that hnds all the hies that are rooted by a particular 
folder. Let list{F{p)) = 0 if p is not dehned or is not a directory and the set 
of all strings x ^ ■■"} where F{p){x) is dehned and a directory. Then: 

0 if F{p) is not dehned 

{p} if F{p) is a regular hie 

or F{p) is the empty directory 
{p} U {Ux&iist{F{p))find{px)) otherwise 
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The question is whether find is guaranteed to terminate. Even if F describes 
a hie system with a hnite number of hies (always the case in practice), loops 
would cause find to build longer and longer paths without ever hitting one 
of the hrst two terminating cases. If there are at most n hies in the system, 
then a path of more than n elements must visit the same directory twice 
— implying there is an alias. So prohibiting aliases is sufficient to make 
find and many other related algorithms work properly. The original UNIX 
hie system did not prohibit all aliases, but had a weaker constraint that is 
enough to assure that find terminates. Dehne links as follows: 

{ 1 if Q;(i)is a directory 

and there is some x G list{a{i)) s. t. a{i){x) = j 
0 otherwise 


The requirement is that T,j^indexeslinks{j,i) < 2 for all i where a{i) is a 
directory plus a requirement that links{j, root) = 1 only if j = root. That is, 
at most one directory j links to directory i and none link to the root directory. 
This constraint is a consequence of the consistency properties above. Suppose 
that ji 7 ^ j 2 violated the constraint - so that a{ji){x) = a{j 2 ){y) = i and 
a{i) is a directory. Because of property there must be p and q so that 
N‘{root,p) = ji and N‘{root,q) = j 2 . But then F{px){".") = F{qy){".") = i 
and a{i){x) = F{px)(f '= F{qy)(f 'so ji = j 2 which contradicts the 
premise. If links{i,root) then the same reasoning tells us i = root by[^ 

In later versions of UNIX things became more complex because of so 
called “soft links” (symbolic links). To include soft-links in the current anal¬ 
ysis, add a third hie type to ordinary hies and directories: a soft link type 
where the contents is just a path. Then modify TV as follows: 


Af{i, xp) 


N'{{a{i)){x),p) if a{i) is a directory map 

J\f{root,Concatenate{a{i),xp)) if a{i) is a soft link 


Sadly, this new version of ff lacks property The normal method for hxing 
that is to count the number of soft-links visited along a path. 


Af{i,p) =7V'(z,p,l) 


Affi, xp, n) 


M'{{a{i)){x),p, n) 
AffConcatenate{a{i),xp),n — 1) 


if a (i) is a directory map 
if a (i) is a soft link and n > 0 
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Soft links, however, introduce loops into the tree and M might visit the same 
directory twice. 

We have to use a similar trick to make find safe with some /c > 0 as the 
number of acceptable soft links to traverse. 

Extensions such as mounted hie systems and union hie systems are easy 
to add to this model. 


3 Coding 

A disk block can be considered to be just a hxed length sequence of “bytes” 
( 8 binary digits representing numbers in the set {0 ... 255}). Ordinary hies 
can be specihed on the disk by a number n indicating how many bytes are in 
the hie and a sequence of disk block numbers bi.. .bm- The hie contents is 
then the result of concatenating Diskipi) ■ ■ ■ Diskipm) and then truncating 
to get n bytes of data. If the hie is a directory, then this is just the hrst step 
and the second step is to decode the directory from the data. For example, 
if hie names are composed of 16-bit Unicode, then the contents of a directory 
hie might be sequences of two byte quantities coding Unicode characters, 
terminated by two 0 bytes, followed by, say, 4 bytes coding an index number. 
If “passwords” should be mapped to 34832 then hexadecimal encoding looks 
something like this: 

70, 61, 73, 73, 77, 6F, 72, 64, 73, 00, 00, 00, 00, 88, 10 

The actual coding is interesting and important, but for this paper, I just 
want to sketch out one method for concreteness so that 

file : Integers x SequenceOf Block Numbers —)■ Contents 


and 

DecodeDirectory : Contents —)■ {Strings —)■ Indexes} 
seem plausible. 

The map a depends on similar encoding. To start, assume we can encode 
both the length n and the sequence of block numbers of a hie in a single 
block. Let’s also encode in that block a “type” that tells us if a hie is 
ordinary, directory, or soft link. The disk drives that were the design targets 
of the UNIX hie system could store somewhere around 2^^ bytes of data in 
2^^ blocks. Disk drives at the time of the writing of this paper can easily hold 
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242 Qj, more bytes in 2^^ blocks. In either case, a single disk block cannot 
hold enough disk block numbers for a really big hie so some of the disk block 
numbers in the sequence encoded in the block are used as indirect numbers 
- they point to blocks that encode more numbers. The details of this are not 
covered here - see [?] for explanations. 


a : Indexes —)■ Contents 

Decode Sequence : Blocks —)■ SequencesO f BlockN umbers 
DecodeSize : Blocks —)■ Integers 

DecodeType : Blocks —)■ {ordinary, directory, softlink} 
ai : Indexes —)■ BlockNumbers 

a2(i) = file{DecodeSize{Disk{ai{i))), DecodeSequence{Disk{ai{i)))) 

{ a 2 {i) if DecodeType{Disk{ai{i))) G [ordinary, softlink} 

DecodeDirectory{a 2 {i)) if DecodeType{Disk{ai{i))) = "'directory" 
undefined otherwise 

file{n, x) = truncate{n, Concatenate{Disk{xi)... Disk{xn))) 


4 Discussion 

The efficiency advantages of the decomposition above are reasonably obvious 
to anyone with an intuition about system programming but we can also 
make an informal complexity analysis. Think of hie system maps as hnite 
sets of pairs: in practice hie systems are hnite. Searching for a hie in a map 
Paths{X) —)■ Contents would, on average take n/2 steps where n is the 
number of pairs. This search would involve testing sequences to see if they 
are equal on each step. This means each step of the search involves multiple 
steps to compute the match. To speed up this search, we need to map 
paths to some sorted data rather than an unordered set. That is what the 
embedding does. Directory maps involve much simpler matching because we 
are matching strings not sequences and directories should generally be small. 
In practice, hie systems can easily contain billions of hies, but directories tend 
to contain just a few entries. If the average size of a directory is k elements, 
then an n element hie system will average a depth of only log^(n). So for a 
hie system containing one billion ordinary hies with average of 10 elements 
in a directory, 9 steps through the tree would resolve an average path with 
each step requiring comparison of a string (not a path) to an average of 5 
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other strings taking us to 45 string matches plus 9 lookups of directories. 
Compare that to 500,000,000 path matches. 

Consider common queries on the hie system such as “hnd(p)”. This is 
efficient to compute using the tree structure. The UNIX command for a 
detailed list is also efficient to compute with the embedded tree structure. 
Detailed list involves extending out the index block(s) to contain additional 
information - such as last modihcation and security/permission data. In this 
case, N provides an index and ai provides the block itself. During the 1980s 
a number of development groups all made the same discovery that detailed 
list was easy to make inefficient for network hie systems - because the control 
block for each hie has to be accessed. 

Of course, we could use alternative structures and it may well be that the 
tradeohs have changed sufficiently since the 1970s. Maybe a detailed analysis 
of the kinds of hie traversals and lookups common in a web site would reveal 
a need for a diherent design. Similarly, disk drives are diherent both in scale 
and operation and hash storage is common. Maybe a hash table would be 
more efficient. Modern implementations usually involve an in memory hash 
table used as a cache so that hash{p) = i only if Miroot^p) = i. This cache 
design amortizes lookup costs. 

Some of the advantages of the UNIX hle-system design are purely se¬ 
mantic. The recursive structure ensures that there are no holes - no paths 
that terminate at a hie that skip over inaccessible directories. This property 
would require additional work to guarantee in a hash-table implementation if 
the architects considered it important. Another set of issues becomes obvious 
once we consider modihcations. 
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